Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
PDF document detection model based on system calls and data provenance
Jingwei LEI, Peng YI, Xiang CHEN, Liang WANG, Ming MAO
Journal of Computer Applications    2022, 42 (12): 3831-3840.   DOI: 10.11772/j.issn.1001-9081.2021101730
Abstract325)   HTML3)    PDF (3249KB)(112)       Save

Focused on the issue that the traditional static detection and dynamic detection methods cannot cope with malicious PDF document attacks using a lot of obfuscation and unknown technologies, a new detection model based on system calls and data provenance, called NtProvenancer, was proposed. Firstly, the system call records during execution of the document were collected by the system call tracing tool. Then, the data provenance technology was used to establish a data provenance graph based on the system calls. After that, the feature segments of system calls were extracted for detection by using the key point algorithm of the graph. The experimental dataset consists of 528 benign PDF documents and 320 malicious ones. The test was carried out on Adobe Reader, and the Term Frequency-Inverse Document Frequency (TF-IDF) and the rarity algorithm in PROVDETECTOR were used to replace the key point algorithm of the graph to conduct the comparative study. The results show that NtProvenancer has better performance on precision and F1 score. Under the optimal parameter setting, the proposed model has the average time of document training and detection stages of 251.51 ms and 60.55 ms respectively, the false alarm rate lower than 5.22%, and the F1 score reached 0.989, showing that NtProvenancer is an efficient and practical model for PDF document detection.

Table and Figures | Reference | Related Articles | Metrics